Method and system for recording streams

ABSTRACT

A method of delivering a live stream includes recording the stream using a recording tier, and playing the stream using a player tier. The step of recording the stream includes sub-steps that begin when the stream is received in a source format. The stream is then converted into an intermediate format (IF), which is an internal format for delivering the stream within an overlay network. The player process begins when a requesting client is associated with a network proxy. In response to receipt at the proxy of a request for the stream or a portion thereof, the proxy retrieves (either from the archive or the data store) a stream manifest and at least one fragment index. Using the fragment index, the intermediate format file fragments are retrieved to the proxy, converted to a target format, and then served in response to the client request.

BACKGROUND OF THE INVENTION

1. Technical Field

This application relates generally to delivery online of high definition(HD) video at broadcast audience scale to popular runtime, environmentsand mobile devices.

2. Brief Description of the Related Art

Distributed computer systems are well-known in the prior art. One suchdistributed computer system is a “content delivery network” or “CDN”that is operated and managed by a service provider. The service providertypically provides the content delivery service on behalf of thirdparties. A “distributed system” of this type typically refers to acollection of autonomous computers linked by a network or networks,together with the software, systems, protocols and techniques designedto facilitate various services, such as content delivery or the supportof outsourced site infrastructure. Typically, “content delivery” meansthe storage, caching, or transmission of content, streaming media andapplications on behalf of content providers, including ancillarytechnologies used therewith including, without limitation, DNS queryhandling, provisioning, data monitoring and reporting, contenttargeting, personalization, and business intelligence.

While content delivery networks provide significant advantages,typically they include dedicated platforms to support delivery ofcontent for multiple third party runtime environments that are, in turn,based on their own proprietary technologies, media servers, andprotocols. These distinct platforms are costly to implement and tomaintain, especially globally and at scale as the number of end usersincreases. Moreover, at the same time, content providers (such aslarge-scale broadcasters, film distributors, and the like) desire theircontent to be delivered online in a manner that complements traditionalmediums such as broadcast TV (including high definition or “HD”television) and DVD. This content may also be provided at different bitrates. End users also desire to interact with the content as they can donow with traditional DVR-based content delivered over satellite orcable. A further complication is that Internet-based content delivery isno longer limited to fixed line environments such as the desktop, asmore and more end users now use mobile devices such as the Apple®iPhone® to receive and view content over mobile environments.

Thus, there is a need to provide an integrated content delivery networkplatform with the ability to deliver online content (such as HD-qualityvideo) at broadcast audience scale to the most popular runtimeenvironments (such as Adobe® Flash®, Microsoft® Silveright®, etc.) aswell as to mobile devices such as the iPhone to match what viewersexpect from traditional broadcast TV. The techniques disclosed hereinaddress this need.

BRIEF SUMMARY

An integrated HTTP-based delivery platform that provides for thedelivery online of HD-video and audio quality content to popular runtimeenvironments operating on multiple types of client devices in both fixedline and mobile environments.

In one embodiment, a method of delivering a live stream is implementedwithin a content delivery network (CDN) and includes the high levelfunctions of recording the stream using a recording tier, and playingthe stream using a player tier. The step of recording the streamincludes a set of sub-steps that begins when the stream is received at aCDN entry point in a source format. The stream is then converted into anintermediate format (IF), which is an internal format for delivering thestream within the CDN and comprises a stream manifest, a set of one ormore fragment indexes (FI), and a set of IF fragments. The fragmentsrepresenting a current portion of the stream are archived in theintermediate format in an archiver, while older (less current) portionsare sent to data store. The player process begins when a requestingclient is associated with a CDN HTTP proxy. In response to receipt atthe HTTP proxy of a request for the stream or a portion thereof, theHTTP proxy retrieves (either from the archive or the data store) thestream manifest and at least one fragment index. Using the fragmentindex, the IF fragments are retrieved to the HTTP proxy, converted to atarget format, and then served in response to the client request. Thesource format may be the same or different from the target format.Preferably, all fragments are accessed, cached and served by the HTTPproxy via HTTP.

In another embodiment, a method of delivering a stream on-demand (VOD)uses a translation tier to manage the creation and/or handling of the IFcomponents, i.e., the stream manifest, the fragment indexes (FI), andthe IF fragments. The translation tier is used in lieu of the recordingtier (in the live delivery network). In one VOD embodiment, thetranslation tier is implemented using an HTTP proxy and a translationprocess. The approach enables VOD streaming from customer and CDN-basedstorage origins, provides single and multiple bitrate (SBR and MBR)streaming, provides support for origin content stored in multipledifferent types of file format containers (supported mp4/flv codesinclude, among others, AAC, MP3, PCM for audio, and H.264 for video),and minimizes download of content beyond what is directly requested bythe end user.

According to another aspect of this disclosure, Intermediate Format (IF)generation and handling may occur entirely within an HTTP proxy. In thisapproach, IF can be extended throughout the entire downstream HTTPdelivery chain including, optionally, to the client itself (if theclient also has an HTTP proxy interface).

The foregoing has outlined some of the more pertinent features of thedisclosed subject matter. These features should be construed to bemerely illustrative. Many other beneficial results can be attained byapplying the disclosed subject matter in a different manner or bymodifying the subject matter as will be described.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the subject matter vention and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram illustrating a known distributed computersystem configured as a content delivery network (CDN);

FIG. 2 is a representative CDN edge machine configuration;

FIG. 3 illustrates a network for HTTP-based delivery of high definition(HD) “live” video to clients across both fixed line and mobileenvironments according to the teachings of this disclosure;

FIG. 4 shows the network of FIG. 3 in additional detail;

FIG. 5 illustrates a representative packet flow across the network ofFIG. 3 from a source format (SF) to a target format (TF) using theIntermediate Fragments (IF) according to the disclosed technique;

FIG. 6 illustrates another view of the flow of media packets into andout of the streaming server framework;

FIG. 7 illustrates how the network for HTTP-based delivery is used toprovide video on demand (VOD) stream delivery;

FIG. 8 illustrates a representative translation machine configuration ofthe VOD portion of the HTTP-based delivery network;

FIG. 9 illustrates a set of C++ classes executing in an edge machinethat facilitate a muxer functionality;

FIG. 10A-10B illustrates an interaction between a client player and anedge ghost process;

FIG. 11 is a more detailed illustration of a set of representative C++classes that comprise an edge server process;

FIG. 12 illustrates how an RMTP Puller interacts with an EP and one ormore Archivers;

FIG. 13 illustrates how a multi-bit rate (MBR) stream flows from anencoder to RTMP Puller; and

FIG. 14 describes the interaction between the RTMP Puller and Archiverusing the RTMP Puller's archiver-access adaptor.

DETAILED DESCRIPTION

FIG. 1 illustrates a known distributed computer system that (asdescribed below) is extended by the techniques herein to provide asingle HTTP-based platform with the ability to deliver online HD videoat broadcast audience scale to the most popular runtime, environmentsand to the latest devices in both fixed line and mobile environments.

In this representative embodiment, a distributed computer system 100 isconfigured as a content delivery network (CDN) and is assumed to have aset of machines 102 a-n distributed around the Internet. Typically, mostof the machines are servers located near the edge of the Internet, i.e.,at or adjacent end user access networks. A network operations commandcenter (NOCC) 104 may be used to administer and manage operations of thevarious machines in the system. Third party sites, such as web site 106,offload delivery of content (e.g., HTML, embedded page objects,streaming media, software downloads, and the like) to the distributedcomputer system 100 and, in particular, to “edge” servers. Typically,content providers offload their content delivery by aliasing (e.g., by aDNS CNAME) given content provider domains or sub-domains to domains thatare managed by the service provider's authoritative domain name service.End users that desire such content may be directed to the distributedcomputer system to obtain that content more reliably and efficiently.Although not shown in detail, the distributed computer system may alsoinclude other infrastructure, such as a distributed data collectionsystem 108 that collects usage and other data from the edge servers,aggregates that data across a region or set of regions, and passes thatdata to other back-end systems 110, 112, 114 and 116 to facilitatemonitoring, logging, alerts, billing, management and other operationaland administrative functions. Distributed network agents 118 monitor thenetwork as well as the server loads and provide network, traffic andload data to a DNS query handling mechanism 115, which is authoritativefor content domains being managed by the CDN. A distributed datatransport mechanism 120 may be used to distribute control information(e.g., metadata to manage content, to facilitate load balancing, and thelike) to the edge servers.

As illustrated in FIG. 2, a given machine 200 in the CDN (sometimesreferring to herein as an “edge machine”) comprises commodity hardware(e.g., an Intel Pentium processor) 202 running an operating systemkernel (such as Linux or variant) 204 that supports one or moreapplications 206 a-n. To facilitate content delivery services, forexample, given machines typically run a set of applications, such as anHTTP proxy 207 (also referred to herein as an “edge server process”), aname server 208, a local monitoring process 210, a distributed datacollection process 212, and the like. The HTTP proxy 207 typicallycomprises a cache, and a manager process for managing the cache anddelivery of content from the edge machine. For streaming media, themachine typically includes one or more media servers, such as a WindowsMedia Server (WMS) or Flash 2.0 server, as required by the supportedmedia formats. When configured as a CDN “edge” machine (or “edgeserver”), the machine shown in FIG. 2 may be configured to provide oneor more extended content delivery features, preferably on adomain-specific, customer-specific basis, preferably using configurationfiles that are distributed to the edge servers using a configurationsystem. A given configuration file preferably is XML-based and includesa set of content handling rules and directives that facilitate one ormore advanced content handling features. The configuration file may bedelivered to the CDN edge server via the data transport mechanism. U.S.Pat. No. 7,111,057 illustrates a useful infrastructure for deliveringand managing edge server content control information and this and otheredge server control information (sometimes referred to as “metadata”)can be provisioned by the CDN service provider itself, or (via anextranet or the like) the content provider customer who operates theorigin server.

The CDN may include a storage subsystem, such as described in U.S. Pat.No. 7,472,178, the disclosure of which is incorporated herein byreference.

The CDN may operate a server cache hierarchy to provide intermediatecaching of customer content; one such cache hierarchy subsystem isdescribed in U.S. Pat. No. 7,376,716, the disclosure of which isincorporated herein by reference.

For live streaming delivery, the CDN may include a live deliverysubsystem, such as described in U.S. Pat. No. 7,296,082, the disclosureof which is incorporated herein by reference.

As will be described, this disclosure describes how the above-identifiedtechnologies can be extended to provide an integrated HTTP-baseddelivery platform that provides for the delivery online of HD-videoquality content to the most popular runtime environments and to thelatest devices in both fixed line and mobile environments. The platformsupports delivery of both “live” and “on-demand” content.

Live Streaming Delivery

As used herein, the following terms shall have the followingrepresentative meanings. For convenience of illustration only, thedescription that follows (with respect to live streaming delivery) is inthe context of the Adobe Flash runtime environment, but this is not alimitation, as a similar type of solution may also be implemented forother runtime environments both fixed line and mobile (including,without limitation, Microsoft Silverlight, Apple iPhone, and others).

An Encoder is a customer-owned or managed machine which takes some rawlive video feed in some format (streaming, satellite, etc.) and deliversthe data to an Entry Point encoded for streaming delivery. An EntryPoint (EP) typically is a process running on a CDN streaming machinewhich receives video data from the customer's Encoder and makes thisdata available to consumers of the live stream. For Adobe Flash, this isa Flash Media Server (FMS) configured to accept connections fromEncoders. A Flash Media Server is a server process for Flash mediaavailable from Adobe Corporation. In this embodiment, an IntermediateRegion (IR) typically is a Flash Media Server which the CDN hasconfigured to act analogously to a streaming set reflector, such asdescribed in U.S. Pat. No. 7,296,082 and U.S. Pat. No. 6,751,673. Thesemachines relay streams from FMS EPs to FMS Edge regions, providing fanout and path diversity. A “Region” typically implies a set of machines(and their associated server processes) that are co-located and areinterconnected to one another for load sharing, typically over aback-end local area network. A Flash Edge machine is a Flash MediaServer which has been configured to accept client requests. This is thesoftware running on the Flash EP, IR, and Edge machines in arepresentative embodiment. Intermediate Format (IF) is an internal (tothe CDN) format for sending streaming data from EP to an edge serverHTTP proxy. As will be described in more detail below, IF preferablycomprises several different pieces, including “Stream Manifest,”“Fragment Indexes,” and “IF Fragments.” Live, DVR and VOD are defined asfollows: “Live” refers to media served in real time as an event occurs;“DVR” refers to serving content acquired from a “live” feed but servedat a later time; “VOD” refers to media served from a single, complete(i.e., not incrementally changing) file or set of files. Real TimeMessaging Protocol (RTMP) is the streaming and RPC protocol used byFlash. Real Time Messaging Protocol Encrypted (RTMPE) is the encryptedversion of RTMP using secrets built into the server and client. “SWF” or“Small Web Format” is the format for Flash client applications. SWFverification refers to a technique by which the Flash Player canauthenticate to FMS that it is playing an unmodified SWF by sendinghashes of the SWF itself along with secrets embedded in the client andserver.

FIG. 3 illustrates an overview of a preferred architecture for livestreaming delivery. A simplified version of this architecture is shownin FIG. 4. As can be seen in FIG. 3, the system generally is dividedinto two independent tiers: a stream recording tier 300, and a streamplayer tier 302. As will be described, the recording process (providedby the stream recording tier 300) is initiated from the Encoder 304forward. Preferably, streams are recorded even if there are currently noviewers (because there may be DVR requests later). The playback process(provided by the stream player tier 302) plays a given stream startingat a given time. Thus, a “live stream,” in effect, is equivalent to a“DVR stream” with a start time of “now.”

Referring to FIG. 3, the live streaming process begins with a streamdelivered from an Encoder 304 to an Entry Point 306. An RTMP Pullercomponent 308 (e.g., running on a Linux-based machine) in an EP Region(not shown) is instructed to subscribe to the stream on the EP 306 andto push the resulting data to one or more Archiver 310 processes,preferably running on other machines. As illustrated, one of theArchivers 310 may operate as the “leader” as a result of executing aleader election protocol across the archiving processes. Preferably, theArchivers 310 act as origin servers for the edge server HTTP proxyprocesses (one of which is shown at 312) for live or near-live requests.The edge server HTTP proxy 312 provides HTTP delivery to requesting enduser clients, one of which is the Client 314. A “Client” is a devicethat includes appropriate hardware and software to connect to theInternet, that speaks at least HTTP, and that includes a contentrendering engine. The Client device type will vary depending on whetherthe device connects to the Internet over a fixed line environment or amobile environment. A representative client is a computer that includesa browser, typically with native or plug-in support for media players,codecs, and the like. If DVR is enabled, content preferably is alsouploaded to the Storage subsystem 316, so that the Storage subsystemserves as the origin for DVR requests as will be described.

As also seen in FIG. 3, the content provider may choose to deliver twocopies of the stream, a primary copy, and a backup copy, to allow thestream to continue with minimal interruption in the event of network orother problems. Preferably, the primary and backup streams are treatedas independent throughout the system up through the edge server HTTPproxy, which preferably has the capability of failing over from theprimary to the backup when the primary is having difficulties, and viceversa.

A content request (from an end user Client 314) is directed to the CDNedge machine HTTP proxy 312, preferably using techniques such asdescribed in U.S. Pat. Nos. 6,108,703, 7,240,100, 7,293,093 and others.When an HTTP proxy 312 receives an HTTP request for a given stream, theHTTP proxy 312 makes various requests, preferably driven by HTTP proxymetadata (as described in U.S. Pat. Nos. 7,240,100, 7,111,057 andothers), possibly via a cache hierarchy 318 (see, e.g., U.S. Pat. No.7,376,716 and others) to learn about and download a stream to serve tothe Client 314. Preferably, the streaming-specific knowledge is handledby the edge machine HTTP proxy 312 directly connected to a Client 314.Any go-forward (cache miss) requests (issued from the HTTP proxy)preferably are standard HTTP requests. In one embodiment, the content isdelivered to the Client 314 from the HTTP proxy 312 as aprogressive-download FLV file. As noted above, the references herein toAdobe FLV are used herein by way of example, as the disclosedarchitecture is not limited for use with Adobe FLV. For secure streams,preferably the Client 314 first authenticates to the HTTP proxy 312using an edge server authentication technique and/or a SWF-verificationback-channel.

When a Client 314 requests a particular stream, the HTTP proxy 312 (towhich the client has been directed, typically via DNS) starts thestreaming process by retrieving a “Stream Manifest” that containspreferably only slowly changing attributes of the stream and informationneeded by the HTTP proxy to track down the actual stream content. TheURL to download this manifest preferably is constructeddeterministically from metadata delivered (e.g., via the distributeddata transport mechanism of FIG. 1) to the HTTP proxy. Preferably, themanifest itself is stored in association with a Stream Manifest Managersystem (not shown) and/or in the storage subsystem 316. Preferably, aStream Manifest describes the various “tracks” that compose a stream,where preferably each track constitutes a different combination of bitrate and type, where type is “audio,” “video,” or “interleaved_AV.” TheStream Manifest preferably includes a sequence of “indexInfo” timeranges for each track that describe forward URL templates, streamproperties, and various other parameters necessary for the HTTP proxy torequest content for that time range.

For “live” requests, the HTTP proxy starts requesting content relativeto “now,” which, in general, is approximately equal to the time on theedge machine HTTP proxy process. Given a seek time, the HTTP proxydownloads a “Fragment Index” whose name preferably is computed based oninformation in the indexInfo range and an epoch seek time. Preferably, aFragment Index covers a given time period (e.g., every few minutes). Byconsulting the Fragment Index, an “Intermediate Format (IF) Fragment”number and an offset into that fragment are obtained. The HTTP proxy canthen begin downloading the file (e.g., via the cache hierarchy 318, orfrom elsewhere within the CDN infrastructure), skipping data before thespecified offset, and then begin serving (to the requesting Client) fromthere. Preferably, the IF fragments are sized for optimal caching by theHTTP proxy. In general, and unless the Stream Manifest indicatesotherwise with a new indexInfo range, for live streaming the HTTP proxythen continues serving data from consecutively-numbered IF Fragments.

As used herein, and in the context of live HTTP-based delivery, theIntermediate Format (IF) describes an internal representation of astream used to get data from the RTMP Puller through to the edge machineHTTP proxy. A “source” format (SF) is a format in which the Entry Point306 provides content and a “target” format (TF) is a format in whichedge machine HTTP proxy 312 delivers data to the Client 314. Accordingto this disclosure, these formats need not be the same. Thus, SF maydiffer from TF, i.e., a stream may be acquired in FLV format and servedin a dynamic or adaptive (variable bit rate) format. The format is thecontainer used to convey the stream; typically, the actual raw audio andvideo chunks are considered opaque data, although transcoding betweendifferent codecs may be implemented as well. By passing the formatsthrough the HTTP proxy (and delivering to the Client via conventionalHTTP), the container used to deliver the content can be changed as longas the underlying codecs are managed appropriately.

Referring now to FIG. 4, the HTTP streaming architecture for livecontent may work as follows. At step 1, a content provider's encoder 404pushes a live FLV stream to Entry Point (EP) 406. At step 2, the RTMPPuller 408 pulls the stream from the EP 406 and breaks it up intoIntermediate Format (IF) file fragments and corresponding indexinformation. A Demuxer process 405 facilitates this operation. ThePuller 408 preferably uses metadata from a Stream Manifest file todetermine how large to make each individual IF fragment. Preferably, andas noted above, IF fragment size is optimized for caching in the cacheassociated with an edge machine HTTP proxy.

At step 3, the Archiver 410 retrieves from the Puller 408 the IFfragments along with their corresponding index information. The Archiver410 appends the index information for each IF fragment to the currentFragment Index (FI) file. Preferably, the Archiver 410 caches apredetermined number of IF fragments for live play-back.

As the fragments age out, preferably they are deleted from the Archiver410 and, at step 4, they are archived, e.g., to the Storage subsystem416. Thus, at set intervals (e.g., every few minutes), the Archiver 410closes the current FI file, archives it to the Storage subsystem 416,and begins creating a new FI file.

At step 5, and after an end user Client 414 has been associated with aparticular edge machine, the HTTP proxy 412 in that machine gets thefragments for live play-back and limited DVR time periods from theArchiver 410 (possibly via the cache-hierarchy 418). Fragments no longeravailable on the Archiver 410 are retrieved from the Storage subsystem416. A Muxer process 415 that operates in association with the HTTPproxy 412 facilitates this operation. Preferably, each IF fragment is aseparate object for the HTTP proxy 412 that can be and is accessedthrough HTTP. In other words, according to this disclosure, the livestream is broken up into many small objects/fragments. The HTTP proxy412 receives DVR commands from the Client player, typically on aseparate HTTP connection. When the client player requests to beginplaying from a new stream position, the HTTP proxy uses metadata fromthe Stream Manifest file to calculate which FI file contains the targettime offset. The FI file is retrieved from the Archiver 410 or thestorage sub-system 416 (or, alternatively, from a peer machineco-located with the edge machine) and contains the IF fragment and byteoffset to begin streaming to the client player.

FIG. 5 illustrates a representative packet flow from source format (SF)to target format (TF), although the conversion processes may be omitted(in other words, source format bits may be placed in the IF Fragmentwithout additional format conversion). As noted above, preferably eachvideo stream is broken into Fragments. Fragments are numberedconsecutively starting at some arbitrary point (which can be determinedby consulting the Fragment Index). The sequence may be discontinuousacross Stream Manifest indexInfo ranges. Each Fragment preferablycomprises header information describing the type of data enclosed.Following these headers are the IF payload, such as a sequence of FLVtags. A target format may also be just an encrypted form (such as basedon AES 128) of the elemental audio/video streams.

The Fragment Indexes enable the HTTP proxy process (to which aparticular Client has been associated) to find a frame around a desired“seek time.” Preferably, each Fragment Index file contains indexinformation covering a fixed amount of time. The exact interval isstored in the Stream Manifest for each indexInfo range. The desired seektime (epoch time) can be rounded down to the nearest interval boundaryto find the Fragment Index to request.

Preferably, each stream is represented completely by the StreamManifest, the Fragment Index and the IF Fragments. In an illustrativeembodiment, the Stream Manifest is an XML file that contains thefollowing information: stream epoch time (this time may be the time whenthe stream started or may be the oldest archived portion of the streamstill available); stream Properties (like bit rate, video size, codecinformation, etc.); information about fragment indexes and which URLpattern to use to request FI file; and URL pattern for the fragments.The Fragment Index (FI) typically comprises the following: informationabout which key frame to start streaming from for a given time slice;key frame-to-fragment number mapping, key frame-to-time mapping, keyframe to byte-offset in that fragment mapping, and so forth. Each IFFragment contains approximately N seconds of stream, preferablyoptimized for HTTP proxy caching and not necessarily fragmented on timeboundaries. Each fragment is composed of a fragment header, fragmentstream header and a payload, and each fragment is uniquely identified bythe fragment number. Fragment numbers incrementally increase.

Typically, and with reference back to FIG. 4, the Archiver 410 has thefragments for the most recent N minutes of the stream, and the rest ofthe fragments are on the Storage subsystem 416. The Archiver creates aStream Manifest XML file for each stream. It puts all the necessaryinformation that an HTTP proxy can use to make fragment and fragmentindex requests. For the Archiver to construct a Stream Manifest,preferably RTMP Puller sends the stream properties downstream.Preferably, the IF Fragment is used to serve time-related data, i.e.actual video/audio bytes. Also, preferably the HTTP proxy (to which theClient has been associated) makes requests for IF Fragments only. Thus,it is desirable to isolate fragments from packets that have streamproperties.

The Muxer subsystem 415 associated with (or within) the HTTP proxydetermines how to request IF, converts IF to the output stream, andpasses this data to the HTTP proxy for serving to the requesting client.In addition, preferably the HTTP proxy process supports a controlchannel by which the client can make any combination of various requestsagainst an active stream including, without limitation, sessionToken,seek, and switch. The control channel facilitates flow control whenworking in some runtime environments, such as where the client lacks itsown flow control facilities. In this situation, the control channelpasses throttle commands that may be based on a percentage of an averagebit rate (over the server-to-client connection) to help maintain full atarget buffer on the client side of the connection. A sessionTokenrequest is a request to provide additional authentication information,e.g., via SWF Authentication. A “seek” is a request to start sendingdata as of a different time in the stream (including “jump to live”). A“switch” is a request to start sending data from a different track fromthe same Stream Manifest. This might be a bit rate switch and/or anangle change.

Thus, the HTTP proxy receives DVR commands from the client player,preferably on a separate HTTP connection. When the client playerrequests that playback begin from a new stream position, the HTTP proxyuses metadata from the Stream Manifest file to calculate which FI filecontains the target time offset. The FI file is retrieved (e.g., fromthe Archiver or the Storage subsystem, or from a peer machine) andcontains the IF fragment and byte offset to begin streaming to theclient player.

As described, the Stream Manifest preferably is an XML file and containsinformation about fragment indexes and how to construct the URL for anFI file, how to construct the URL for the “now” request, and how toconstruct the URL for the fragments. The HTTP proxy caches the manifest,which can be retrieved to the proxy either from an Archiver (which maybe tried first), or the Storage subsystem. Client players connect to theHTTP proxy to play the live stream (i.e., connect to the stream's “now”time). In response, the HTTP proxy makes a forward request to theArchiver to fetch the “now” time on a live stream. Metadata in theStream Manifest is used by the HTTP proxy to create the “now” URL.

As also described, a stream has a number of FI files. Each containsstream keyframe information for a given time slice. The Fragment Indexallows time offsets to be mapped to fragment numbers and byte offsets.The Stream Manifest file defines the time slice for each FI file.

Each IF Fragment contains approximately N seconds of a stream. Eachfragment is composed of a header and a payload. The HTTP proxyunderstands the data in the header, but the payload is opaque. The HTTPproxy links together with a Muxer component to convert the IF-formattedpayload to the target format that is streamed to the client player. Thefragments are cached in the HTTP proxy for re-use, and each fragment isidentified with its stream name and an integer suffix that increasesincrementally. As described above, Archiver has the fragments for themost recent N minutes of the stream, and the rest of the fragments areon the Storage subsystem.

For non-authenticated content, preferably the client player connects toan http://URL to play a stream. Query string parameters can be used torequest a particular seek time if the default (live if the stream islive, or the beginning of the stream if it is not live) is notappropriate. For authenticated content, preferably the originalhttp://URL additionally contains a shared authentication token querystring parameter generated by the customer origin. This enables the HTTPproxy process to serve the stream for some configured amount of time(e.g. a given number of seconds). After that time, the HTTP proxyprocess terminates the connection unless, for example, an out-of-bandcontrol POST is received with a signed “session token.” Although notmeant to be limiting, in one approach this token preferably is generatedby the client by connecting to an FMS (or equivalent) edge machine thatcan perform SWF Verification (as shown in FIG. 3). The machine returnsthe signed session token to the client to be forwarded back to the HTTPproxy process as a control channel POST. Once the session token isreceived by the HTTP proxy, the stream preferably will playindefinitely. Other types of stream authentication may be implemented aswell.

FIG. 6 is another view of the flow of the media packets into and out ofthe streaming server framework of this disclosure for live streaming. Asnoted above, the framework processes (demuxes) the incoming mediapackets into an intermediate format (IF). In particular, the Encoderpushes the CDN customer content into an Entry Point. The Puller thenpulls the content from the EP and passes the data to its associatedDemuxer, which converts the incoming source format (SF, such as FLV) toIF fragments before injecting them into the Archiver network. AnArchiver receives data from the RTMP Puller and incrementally writesthis data to memory, such as a RAM disk (or other data store). If theHTTP proxy (to which a Client has been associated) requests a Fragmentor Fragment Index that is currently in the process of being receivedfrom the Puller, the Archiver sends the response (preferably in achunk-encoded HTTP response) so that the data can be sent as soon as itis received. Once a Fragment or Fragment Index is complete, a designatedleader Archiver (selected via a leader election process) attempts toupload the resulting file to the Storage subsystem. As noted above, themuxer component associated with the edge region/server processes (muxes)the packets to the desired target format (TF) before the packets reachthe end clients.

A Demuxer process may be integral to the Puller; likewise, a Muxerprocess may be integral to the HTTP proxy process. There may be oneDemuxer process for multiple Pullers; there may be one Muxer process formultiple HTTP proxies (within a particular Region).

As noted above, in terms of functionality, Demuxer converts regularstream packets into IF fragments and Muxer does the opposite. Bydefinition, Demuxer and Muxer should complement each other. As noted,Demuxer can be part of an RTMP Puller process or can be a separateprocess running on the RTMP Puller machine. Demuxer receives input viathe RTMP Puller. It is responsible to do the following: generate IFFragment Header, take the source format and package the same into IFbody, add Fragment Stream Header, Push IF fragment to Archiver, analyzethe fragment and generate index information pertinent to key framelocation within a given FLV packet, Push Key frame information to theArchiver. This can be done synchronously/asynchronously with respect tothe IF fragment transmission. Preferably, Demuxer also is responsiblefor determining an optimal size of the fragment, which fragment sizeshould be optimal for HTTP proxy caching. Demuxer can base its decision(regarding the optimal size of the fragment) by examining the followingstream properties: incoming live stream byte rate/bit rate; Key FrameInterval, or a combination of both. Apart from constructing IFFragments, Demuxer is also responsible to push Stream Properties and keyframe information to the Archiver. Archiver can then create the StreamManifest file that will be used by the HTTP proxy/Muxer to make fragmentindex and individual fragment requests. As described above, Muxercomplements Demuxer. As Demuxer is responsible for constructing IFFragments, Muxer is responsible for deconstructing the IF Fragments andconverting the IF Payload format to the target format (TF) that theClient requested. The Muxer may also provide the following informationto the HTTP proxy: statistics information about HTTP delivered Streams;and client session playback Information, such as playback duration,etc., and Muxer health data.

The Demuxer and Muxer enable dynamic transmux output to other fileformats. This enables the system to leverage a single set of contentsources for different device capabilities, e.g., iPhone 3.0 streamingusing MPEG-2 TS Segments, Microsoft Silverlight 3 (with H.264 playback),Shoutcast, and so forth.

As a variant to the above-described “pull” model that operates betweenan Encoder and an Archiver, it is also possible to use a “push-based”approach.

Video on Demand (VOD) Delivery

The above-described architecture is useful for live streaming,particularly over formats such as Flash. The following section describesadding video on demand (VOD) support to the platform. In particular, thesolution described below provides VOD streaming from customer andStorage subsystem-based origins, provides single and multiple bitrate(SBR and MBR) streaming, provides support for origin content stored inflv and mp4/flv containers (supported mp4/flv codes include, amongothers, AAC, MP3, PCM for audio, and H.264 for video), and minimizesdownload of content beyond what is directly requested by the end user.

For VOD delivery, the stream recorder tier 300 (of FIG. 3) is replaced,preferably with a translation tier, as will be described. For VODdelivery using HTTP, the Fragment Indexes may be generated from theorigin content on-the-fly (e.g., by scanning FLV or parsing MP4 MOOVatoms) and caching these indexes. Actual data retrievals may then beimplemented as “partial object caching” (POC) retrievals directly fromsource material at the edge region or via an intermediate translation(e.g., by a cache-h parent) into an Intermediate Format. As used herein,partial object caching refers to the ability of an HTTP proxy to fetch acontent object in fragments only as needed rather than downloading theentire content object. The HTTP proxy can cache these fragments forfuture use rather than having to release them after being served fromthe proxy. An origin server from which the content object fragments areretrieved in this manner must support the use of HTTP Range requests.

Before describing a VOD implementation in detail, the following sectiondescribes several ways in which VOD content is off-loaded for HTTPdelivery to the CDN. In a first embodiment, a conversion tool (a script)is used to convert source content flv to IF, with the resulting IF filesthen uploaded to the Storage subsystem. In this approach, metadata isused to have an HTTP proxy go forward to the Storage subsystem toretrieve the stream manifest, which then references the Storagesubsystem for the remaining content. In this approach, files in mp4/flvare first converted to flv (e.g., using ffmpeg copy mode) to change thecontainer to fly. Another approach is to have a CDN customer upload rawmedia files to the Storage subsystem and to run a conversion tool there.Yet another alternative is to have the customer (or encoder) producecontent in IF directly.

The translation tier approach is now described. In this approach, anon-demand dynamic IF generator machine takes requests for IF (manifests,indexes, and fragments) and satisfies these requests by dynamicallyretrieving flv or mp4/f4v input file ranges (either from the Storagesubsystem or customer origin). From there, HTTP proxy treatment isessentially the same as the “conversion tool” options described above.The generator machine preferably runs its own HTTP proxy (the“translator HTTP proxy”) to cache various inputs and outputs, togetherwith a translator process (described below) that accepts requests (e.g.,from a localhost connection to the translator HTTP proxy) and generatesIF based on data retrieved from the HTTP proxy via an associated cacheprocess. In an alternative, the translator process may comprise part ofthe translator HTTP proxy, in which case IF generation takes placewithin the proxy. Fragment generation may also be carried out in an edgemachine HTTP proxy or even further downstream (into the Client itself),such as where a Client maintains a session connection with one or morepeer clients.

An architecture and request flow of a preferred approach is shown inFIG. 7. In this embodiment, which is merely representative andnon-limiting, a translation tier 700 is located between an origin 702(e.g., customer origin, or the Storage subsystem, or both) and thestream player tier 704. In a representative embodiment, the translationtier executes in its own portion (e.g., a Microsoft IIS or equivalentnetwork) within the CDN, preferably in a Region dedicated to thispurpose. Alternatively, a translator (as described below) may run on asubset of HTTP-based edge machine Regions.

FIG. 8 illustrates a representative translator machine 800. Thismachine, like the machine shown in FIG. 2, includes CPU, memory, diskstore and network interfaces to provide an Internet-accessible machine.In addition, as shown in FIG. 8, in this embodiment, the two maincomponents of the translator machine comprise the HTTP proxy 802, and atranslator process 804. The HTTP proxy 802 performs partial objectcaching (POC) and interacts with the translator process 804, whichgenerates the stream manifest, index and fragments. The proxy andtranslator components interface to one another via shared memory 806 anda stream cache process 808, described in more detail below.

Component Descriptions

FIG. 9 is a C++ class diagram describing how an HTTP proxy (referred toas “Ghost”) process and the Muxer module interact during a clientsession. Within the edge server process (also called “ghost”), there area set of classes, namely, a Driver class 900, a Stream class 902, and aStreamManifest class 904. A set of interfaces includes a MuxerIOHandlerclass 906, a MuxerManifestAdaptor class 908, and a Muxer class 910. AMuxerAdaptor Class 912 provides a streaming implementation within theedge server process, with support from a streaming library, called HttpStream Muxer 912. The Muxer implementation comprises the followingmodules: Muxer Adaptor Class 912, which inherits from Muxer class 910defined by the edge server process. Muxer Adaptor Class 912 acts as awrapper/adaptor to make calls to the Http Stream Muxer Library 912. HttpStream Muxer Library links to the edge server process. Preferably, theMuxer Adaptor Class is the only class that has access to this library'sAPIs. This library is mainly used to perform all Muxer-related operationi.e. converting from IF Payload to Target format.

The process flows in FIGS. 10A and 10B illustrates a typical work flowof an Http Stream Request to play live content. Preferably, in thisembodiment every new request to the HTTP edge server process (referredto in the drawing as an “Edge Ghost”) for Http stream is considered as aprogressive download stream. The workflow in FIGS. 10A and 10Bhighlights the interaction between the edge server process andMuxerAdaptor/Http Stream Muxer library. In particular, and as will bedescribed, typically steps 1000, 1002, 1004, 1006, 1008, 1010, 1012,1016, 1020, 1022, 1026, 1030 and 1038 are performed by one or more ofthe process classes shown in FIG. 9. Typically, steps 1014, 1018, 1024,1028, 1032, 1034 and 1036 are performed by the Muxer Adaptor Class.Muxer Adaptor will call the necessary Http Stream Muxer library APIs toperform certain steps.

By way of background, and to facilitate what is represented in thedrawing, a set of connectors is shown, namely, a Player Connector (P), aCache Store Connector (C), an Archiver/Storage Connector (N), and aWorkflow Connector (W). Generally, a client player connects to an edgeserver process (via the CDN DNS, or otherwise) to play live streams(i.e., connect to the stream's “now” time). The edge server processmakes a forward request to an Archiver to fetch the “now” time on a livestream. Metadata in the Manifest File is used by the edge server processto create a “now” URL.

The Player Connector represents an end user Player making request withStream ID and query parameters to seek to a “now” time. As noted above,a stream has a number of FI files, and each contains stream key frameinformation for a given time slice. The fragment index allows timeoffsets to be mapped to fragment numbers and byte offsets. The StreamManifest file defines the time slice for each FI file. An inputassociated to the FI is the “now” request, and an output is the lastreceived keyframe information for the “now” request. At step 1000, theedge server process accepts the connection for a stream and creates adriver object. Using the Stream ID, the edge ghost process then createsa Stream/Manifest object and constructs a Stream Manifest URL. This isstep 1002. Using the Stream Manifest URL, the edge server processobtains the Stream Manifest either its local cache (Cache Store 1003) orfrom an Archiver or Storage (illustrated as 1005). This is step 1004.The edge server process then continues at step 1006 (using the StreamManifest File) to parse the Stream Manifest and to store it in aManifest object. Using the play time as “now” as identified in the queryparameters, the edge server process generates a “now” URL at step 1008.Using this “now” URL, at step 1010 the edge server process creates theNow response, using local cache or by going forward to the Archiver orStorage as necessary. The Now response, which contains a latest keyframeepoch time, is then used by the edge server process at step 1012 tocreate a MuxerAdaptor Class object. The keyframe epoch time is thenpassed. The MuxerAdaptor Class object so created constructs Http streamheaders using the MuxerManifestAdaptor class. This is step 1014. Thesestream headers are then returned to the edge server process, which thensends those headers back to the player in step 1016. The keyframe epochtime is then used by the MuxerAdaptorClass object to calculate aFragment Index File number. This is step 1018, shown in FIG. 10B. Usingthe Fragment Index File number, control returns back to the edge serverprocess, which, at step 1020, gets the Fragment Index file from eitherlocal cache or Archiver/Storage. Using the Fragment Index file, the edgeserver process continues at step 1022 to parse the file and pass thekeyframe details to the MuxerAdaptorClass object. At step 1024, theMuxerAdaptorClass object using fragment number/offset passed from theedge server process to make a request for the fragment. This request ismade to the edge server process, using the fragment number. Thus, atstep 1026, the edge server process gets the IF fragment from theArchiver/Storage, as the case may be. The fragment is then passed backto the MuxerAdaptorClass object, which, at step 1028, reads the IFfragment and constructs data for the player in the target format. Thestream data is then passed back to the edge server process, which, atstep 1030, returns it to the player.

At step 1032, the MuxerAdaptorClass object tests whether the fragment iscomplete. If not, control continues at step 1034, while the object waitsfor the remaining portion. If, however, the outcome of the test at step1032 indicates the fragment is complete, the MuxerAdaptorClass continuesits processing at step 1036, either by returning to step 1024 (for thenext Fragment) or by returning to step 1018 (for the next Fragmentindex). If there is no more stream data, control returns to the edgeserver process, which, at step 1038, closes the Player connection tocomplete the process of responding to this particular request.

In a seek mode, preferably the client player does not establish a newconnection for the seek request. For Http-based streaming, preferablythe client only initiates a new connection to the edge ghost process topost a seek control message. Preferably, the session id for this seekrequest is the same as that used when the connection is firstestablished. When the edge ghost process receives the seek controlmessage from the client player, preferably it uses the same objects thatwere created when the session first started. For seek requests, the edgeghost process may calls a dedicated Muxer process function. TheMuxerAdaptorClass and Http Stream Muxer library handle the transitionfrom the previous mode, e.g., live mode, to the new seek mode.

As noted above, a representative IF Fragment includes a fragment header,a fragment stream header, and a payload. The fragment header is a fixedsize header, and each fragment has only one such header. It provides IPspecification version, fragment number, fragment creation time, fragmentstream type (audio/video/interleaved AV), the latest stream manifestfile version, payload offset field to indicate the start of the payloaddata, header type and header length. The fragment stream header can beof variable size, and each fragment has only one such header. Itprovides information about the stream properties: stream name, streamsource format and sub-type (e.g., FLV), IF payload format and sub-type(e.g., FLV, F-MP4, etc.) header type and header length.

As described, preferably each IF fragment contains approximately “n”seconds of stream. Preferably, IF files are sized for optimal edge ghostprocess caching and not necessarily fragmented on time boundaries. Asnoted above, each fragment is composed of a header and a payload. Theedge ghost process understands the data in the header but typically thepayload is opaque. As illustrated in the FIG. 10A-B, the edge ghostprocess links together with the muxer component to convert theIF-formatted payload to the target format, which is then streamed to theclient player. Preferably, the edge ghost process also caches thefragments, preferably as regular StoreEntries. Each fragment isidentified with its stream name and an integer suffix that increasesincrementally. Archiver has the fragments for the most recent N minutesof the stream, and the rest of the fragments are on Storage. Preferably,any fragment for a particular stream type has an associated IF FragmentNumber, which is a unique identifier. The fragment number increasesincrementally. The edge ghost process requests for IF fragment using itsfragment number. RTMP Puller is responsible to assign a unique number toeach fragment.

As noted above, the content provider (or a third party) encoder pushes alive FLV stream to the entry point, e.g., over RTMP. The RTMP Core pullsthe stream from the entry point and breaks it up into the filefragments. The RTMP Core contacts the Stream Manifest Manager andprovides stream information. The Stream Manifest Manager suppliesinformation regarding which Archiver is set to push IF Fragments. TheRTMP Core creates the IF Fragments and pushes them to the Archiver. TheIF Fragment size is optimized for edge server ghost process caching. TheArchiver retrieves from the RTMP Core the IF Fragments along with streamkeyframe information. It then appends the stream's keyframe informationto a current Fragment Index (FI) file. The Archiver caches apredetermined number of IF Fragments and fragment indexes for liveplayback. As the fragments age out, they are deleted from the Archiverand archived to Storage. At set intervals (e.g., every few minutes), theArchiver closes the current FI file, archives it to Storage, and beginscreating a new FI file. The edge ghost process gets the fragments forlive playback and limited DVR time periods from the Archiver. Fragmentsno longer available on the Archiver are retrieved from Storage. Each IFfragment is a separate object for the edge ghost process that can beaccessed through HTTP. Thus, the live stream is broken upon into manysmall objects/fragments. The edge ghost process receives DVR commandsfrom the client player on a separate HTTP connection. When the clientplayer requests that playback begin from a new stream position, the edgeghost process uses metadata from the Stream Manifest File to calculatewhich FI file contains the target time offset. The FI file is retrieved(from a peer, from an Archiver, from Storage, as the case may be) andcontains the IF fragment and byte offset to begin streaming to theclient player. Thus, requests are dynamically re-assembled by the edgeghost process from fragments on-the-fly.

As also noted above, the muxer and demuxer enable dynamic transmuxoutput to other file formats. This enables the system to leverage asingle set of content sources for different device capabilities (e.g.,iPhone 3.0 streaming using MPEG-2 TS Segments, Microsoft Silverlight 3(with H.264 playback), Shoutcast, and so forth.

The following provides additional details regarding the support providedin the edge server process.

Preferably, the functionality described herein is implemented as aseparate subsystem in the edge server process. FIG. 11 is a diagram thatshows the relationship between the C++ classes in one embodiment. Theseclasses are now described. As noted above, preferably there are a set ofedge server process classes that are not exposed to the muxerimplementations. They include a Driver 1100, SessionManager 1102,StreamRepository 1104, StreamEntry 1106, Stream 1108, Manifest 1110. Theabstract classes for the muxer include MuxerIOHandler 1112, Muxer 1114,MuxerManifestAdapter 1116, and FlvMuxer 1118.

There is an instance of the Driver 1100 in request_t for each requestthat is configured for Http streaming. This is the overall driver. Itlooks up the Stream instance for a Stream ID from the StreamRepository1104. If necessary, the Driver 110 fetches and parses the streammanifest for the stream. It also gets (from the Archiver) the “now” timefor the stream and checks to be sure that the time requested by theclient is permitted; if not, an error is returned. The Driver 1100 isalso responsible for creating a Muxer 1114. Further, the Driver sends astartStream message to the Muxer. The Muxer responds by assembling thestream using the MuxerIOHandler interface 1112 from the Driver 1100.When the stream is finished, the Muxer informs the Driver 1000, whichterminates the chunk-encoded response to the client.

The edge server process generates a unique session ID for each streamclient. The SessionManager 1102 maps each session ID to its currentDriver. In particular, the session ID is added to the generated streamby the Muxer 1114 using an “onMeta” FLV tag. The client can make“control” requests to the edge server process to change the current timeof a stream (DVR functionality), or change the bitrate, or switch to adifferent rendition of the stream. SessionManager also is used forreporting purposes.

The StreamRepository 1104 is a hash table with a list of all thestreams, keyed by StreamID, where StreamID is a metadata parameter. Thevalue for each StreamID is a StreamEntry. StreamRepository can be usedfor reporting purposes to see all the streams currently served by theedge server process.

The StreamEntry 1106 is a set two streams: primary and backup. Thesestreams show the same event but might come from two differententrypoints, which may not be synchronized. Each of these streamspreferably has its own manifest. The StreamEntry knows which one is the“preferred” stream.

The Stream 1108 has a manifest, and a set of functions to asynchronouslyfetch the “now” time, indexes and fragments, and it keeps track ofmiscellaneous statistics (number of concurrent clients, etc.). TheManifest 1110 encapsulates the stream manifest. Another class (calledFetcher), not shown, is used to fetch a variety of objects, such asmanifests, “now” time, IF indexes, and IF fragments.

The Muxer interfaces provide the FLV muxer implementation. MuxerIOHander1112 is used by the Muxer to handle asynchronous I/O. The Muxer 1114transforms an input (made of IF fragments) into a stream that a playercan use. The MuxerManifestAdapter 1116 is a class that gives the Muxeraccess to the Manifest 1110. The muxer implementation is done bycreating a streaming::FlvMuxer class which implements thestreaming::Muxer interface.

As noted above, the Stream Manifest is an XML file that contains Streamepoch time (when the stream started or may be the oldest archivedportion still available), information about fragment indexes and how toconstruct the URL for a FI file, how to construct the URL for the “now”request, and how to construct the URL for the fragments. The edge serverprocess caches the manifest as a regular StoreEntry. It is assumed thatthe manifest URL is the same as the incoming URL without the querystring. Preferably, there is a manifest file on the Archiver, and one onStorage. When the manifest is needed, the Archiver is tried first.

A stream has a number of FI files. A Fragment Index (FI) contains streamindex information for a given time slice. The FI allows time offsets tobe mapped to fragment numbers and byte offsets. The Stream Manifest filedefines the time slice for each FI file. The input to the FI is “now” ora requested time offset. The output of the FI is a fragment number orbyte offset. As described above, client players connect to the ghostedge server process to play live streams. (i.e., connect to stream “now”time). In response, the edge server process makes a forward request tothe Archiver to fetch the “now” time on a live stream. Metadata in themanifest file is used by the process to create the “now” URL.

Each fragment contains approximately n seconds of stream. As notedabove, preferably IF files are sized for optimal caching and notfragmented on time boundaries. Each fragment is composed of a header anda payload. The edge server process understands the data in the headerbut the payload is opaque. The edge server process links together with amuxer component to convert the IF formatted payload to the targetformat, which is then streamed to the client player. The edge serverprocess caches the fragments as regular StoreEntries. Each fragment isidentified with its stream URL and a query string parameter indicatingthe fragment number. Archiver has the fragments for the most recent Nminutes of the stream, and the rest of the fragments are on Storage.

There may be several incoming source formats to an EP. Preferably, theincoming source formats get converted to the IF format at the RTMPPuller. The edge server process converts the IF format to the targetformat based on the user request.

The following provides additional details regarding the RTMP Pullercomponent.

As illustrated in FIG. 12, the RMTP Puller 1200 acts like a Flash clientwhen interacting with the Flash EP, and it connects to Archivers 1202,1204 and 1206 by querying the stream manifest file. Preferably, the RTMPPuller resides in edge server machines and pulls streams from the Flashentry points 1208 in a same or nearby CDN region. Preferably, more thanone instance of RTMP Puller exists in a given machine. The RTMP Pullerhas knowledge of a set of one or more entry points it should connect (asidentified in metadata or via some other means). Upon startup, the RTMPPuller establishes a master connection (e.g., an RTMP channel used forcontrol messages) with the target entry points. Once a streamannouncement (e.g., as described in U.S. Pat. No. 6,751,673) reaches theRTMP Puller, the RTMP Puller will connect, authenticate (e.g., using anauthentication scheme as described above), and then pull the Flashstreams from the Flash EP. The RTMP Puller injects the incoming FLVstream into Demuxer. As noted above, the Demuxer packages the incomingstream into an appropriate IF fragment and pushes the IF fragments tothe Archivers interested in the stream. The actual stream may be pulledfrom the EP on a separate stream connection.

As shown in FIG. 12, preferably the RTMP Puller pulls the stream S1 fromEP1 and will push the demuxed stream to Archiver-1 1202 and Archiver-21204 in Region-1, and to Archiver-1 1206 in Region-2, by choosing theArchiver targets via a stream manifest file supplied by the streammanifest manager.

Preferably, the RTMP Puller acts as a leader node. The diagram in FIG.12 illustrates one possible interaction between the Flash EP and RTMPPuller.

Preferably, the RTMP Puller retains the original live stream name comingfrom the EP and, based on whether the stream is a primary or secondarystream, tags the IF fragment name and then involves an Archiver-accessplug-in to share the data. RTMP Puller knows or can ascertain whether agiven streamed is provisioned as a multi-bitrate (MBR) stream or aregular stream. For MBR streams, one instance of RTMP Puller pulls allthe pertinent MBR streams from the Flash EP but preferably over separateconnections.

FIG. 13 illustrates how an MBR stream flows from encoder to RTMP Pullerin one such embodiment. As can be seen, the encoder 1300 interleaves allthree (in this example) MBR streams over single socket connection 1302.The EP 1304 announces to RTMP Puller 1306 about the availability of thevarious streams. Based on the provisioning and the streamID, RTMP Pullerdetects MBR streams and pulls all the relevant streams over a dedicatedRTMP connection. It then packages them into the IF fragments and pushesthe data to the Archivers. Preferably, each MBR stream is packaged as ifit is an independent stream, but tag grouping consistency is maintainedacross different MBR streams for a given streamID. Also, the EPremembers the set of streams assigned to a given RTMP Puller. If theRTMP Puller instance becomes unavailable due to connectivity or processcrash issues, streams pertinent to that RTMP Puller are redistributed tothe remaining RTMP Puller(s).

As noted above, RTMP Puller uses the Demuxer to convert the sourceformat to the IF before pushing the packets to the Archiver. Demuxer canbe part of RTMP Puller or a separate process running in associationtherewith, as described above.

Preferably, RTMP Puller pushes multiple copies of identical IF fragmentsto configurable number of Archivers. Preferably, communications betweenthe RTMP Puller and the Archiver occur via an Archiver-access adaptor,which reads the stream manifest file, chooses an appropriate Archiver,and appends the Archiver's information along with the Archiver role (asprimary or backup). FIG. 14 describes the interaction between the RTMPPuller and Archiver using the RTMP Puller's archiver-access adaptor.

An RTMP Puller preferably uses an Archiver Client library to interactwith the Archivers and the Stream Manifest Manager (SMM) system. Thelibrary ensures that the Stream Manifest is updated to indicate thenewly-published stream. As noted above, and based on the RTMP Puller'sregion, a set of Archivers is chosen to handle the stream. Internally,RTMP Puller uses the demuxer (as a library) to split the incoming streaminto FLV tags, identify key frames, and package the resulting data inIntermediate Format. The Archiver Client library pushes the streamcontent to each Archiver in the set. Preferably, the Client librarypushes out all of the stream data using chunk-encoded HTTP requestsauthentication. To ensure that FLV tags are delivered atomically,preferably each HTTP chunk contains an integral number of complete FLVtags. Chunks carry additional information such as when to start newfragments, where key frames can be found, and per-chunk integrityhashes. If any of the connections fail, the client library issues aspecial request to the Archiver to discover the correct “resume”position and then start another POST from that point forward.

The Archiver receives data from RMTP Puller and incrementally writesthis data to a RAM disk, or equivalent. If an edge server processrequests a fragment or fragment index which is currently in the processof being receiving from RTMP Puller, Archiver sends the response in achunk-encoded HTTP response so that the data can be sent as soon as itis received without the full file size in advance. If a response isinterrupted before the edge server process receives the end-of-responsemarker from Archiver, the edge server process simply discards the objectand re-requests the data gain if the data is still needed. Once afragment or fragment index is complete, a designated “leader” Archiverattempts to upload the resulting file to Storage. In the alternative,several fragments may be combined into a single serve-from-zip-likearchive to allow Storage to store fewer larger objects. In thissituation, Storage hides this aggregation from the edge server processby looking up the right fragment from the archive. The non-leadArchivers monitor the lead Archiver's progress and perform the Storageuploads if the lead Archiver is not doing so.

The following provides details regarding a Stream Manifest Manager (SMM)component and its application programming interface. Stream ManifestManager requests are standard HTTP requests with specific methods, URLpaths, headers, and body content. The hostname for each requesttypically depends on whether the stream is a primary or backup stream;preferably, independent SMM instances are run for primary and backupstreams. The source (RTMP Puller) may use domain names configured via anetwork configuration file. These domain names may be resolved to aspecific, live SMM machine by the process described in U.S. Pat. No.7,111,061. Preferably, an SMM instance runs on two separate ports: afirst port that accepts “read-only” requests, while a second portaccepts all requests. This port separation allows separate processes toservice update requests and edge server process download requests in thefuture. Preferably, request parameters are specified in a request URLand request headers. Preferably, SMM-specific request headers areprefixed with “X-smm-*”.

In operation, the source (RTMP Puller) forwards all “X-smm-*” headersfrom an SMM “START” response to the Archiver Client Library, which inturn forwards these values to the Archiver. This data flow eliminatesthe need for per-stream provisioning/configuration information inArchiver. To prevent confusion among Stream Manifest Managers andArchivers when multiple sources (RTMP Pullers) are involved, each time asource begins dealing with a stream, preferably it assigns a uniqueidentifier (within that particular track). This identifier isrepresented as two separate values “X-smm-source-start,” which is theepoch time the source received the first request related to this stream,and “X-smm-source-identity,” which is an identifier for the sourceitself, e.g. IP address. Preferably, source identifiers are assigned anorder based first on a major comparison of the “start” time” andrequests from the source to SMM include a source identifier. Preferably,this same identifier information is passed to the Archiver (which isaccomplished implicitly by forwarding the X-smm-* headers to Archiver).In one embodiment, smm supports setting the “stream type”, i.e., the“type” attribute on the “track” tag which contains the actual ranges.Preferably, the stream type is set by adding “/A”, “/V”, or “/AV” afterthe primary/backup designator in the START, STOP, or DELETE url. Ifomitted, the default (“/AV”) is assumed.

Preferably, communication between the source (RTMP Puller), SMMs, andArchivers is protected by a single encryption key set and RTMP Pullerwill accept streams only from valid EPs, and EPs accept streams onlyfrom authorized encoders. In one embodiment, global configuration filevariables define the keys for internal communication. To supportrotation, a first key listed is used to sign requests, while a full setis used to verify requests. Preferably, Archiver uses two additionalsets of keys for its own use: a per-customer Storage upload (postfile)key, and a per-customer edge server process download keys. Preferably,the actual key secrets are distributed by MDT (such as described in U.S.Pat. No. 7,111,057) to both Archiver and SMM. To eliminate the need forper-stream provisioning and configuration information in Archiver, eachset of keys in the MDT channel is “named,” and SMM sends to Archiver(via the source) the key set name to be used for Storage uploads and thekey set name to be used for ghost process downloads for each individualstream track. Preferably, SMM consumes an MDT key channel toauthenticate Stream Manifest GET requests from the edge server process.

The SMM uses an application programming interface (API). A START requestis used to initiate streaming. A STOP request is used by the streamsource, e.g., RTMP Puller, to indicate that a stream has ended. Arequest to an SMM also may be used to DELETE a manifest, a specifictrack, or a specific range. Manifests are retrieved through the APIusing a GET request. A RESUME MODE request is a variant on a STARTrequest. When SMM wants to reuse the same URL path as an existing streamwhich SMM believes might still be “live,” SMM needs assistance from RTMPPuller to determine the correct starting fragment number. This requestis used for this purpose. A REMAP request is used by the stream sourceto initiate an evaluation of quality for a particular Archiver set andto determine whether a remap to another set is needed. Preferably,requests to SMM are standard HTTP requests with authentication headers.SMM machines include a utility that is an authentication wrapper aroundCURL.

When a source (RTMP Puller) learns of a new stream track, it begins bymaking a START request against the SMM domain name associated with theprimary/backup status of the stream. This START request has one ofseveral different replies. A give up reply (case (1)) indicates thatanother source with a higher source identifier has already startedstreaming. Another reply (2) is to start a new URL path and start withfragment 0. This case applies to a brand-new track, when the status of aprevious source cannot be determined, and when certain index parametersneed to be changed. In this case, the source can initialize an ArchiverClient Library to begin pushing the stream to the Archiver set specifiedby SMM starting with fragment 0. Another reply (3) is to use an existingURL path beginning with a non-zero fragment. This applies to apreviously existing track when the previous source stopped the streamcleanly (or when it could be determined when the previous source closedthe stream). In this case, the source can initialize the Archiver ClientLibrary to begin pushing the stream to the Archiver set and fragmentspecified by SMM. Another reply is a request that the source attempt todetermine the correct resume point (fragment) in an existing URL path.This applies to a previously existing track when the previous source wasnot cleanly stopped. When SMM chooses this option, the source must usethe Archiver Client Library to determine the correct “resume” fragmentnumber. Once this number is determined, the source makes a second START“resume mode” SMM request with some additional information. SMM willreply with either one of (1), (2) and (3) described above. If the replyis case (2), the source may need to close the Archiver Client Libraryinstance and create a new one, because arbitrary parameters may havechanged. Case 3 uses the same Archiver Client Library instance whichcomputed the resume offset. To help the source decide whether or not tostart a new Archiver Client Library instance, the “resume mode” replyincludes a special “X-smm-reset” header that directly tells the sourcewhether a new Client instance needs to be created.

When a source sees the end of the stream, it makes a STOP request to SMMonce it has received a “close” notification from the Archiver ClientLibrary for that stream.

In an alternative embodiment, SMM may also make periodic “now” requestsagainst the Archivers to determine whether the stream is actually stilllive. If the Archivers have not received data in some configured amountof time, the SMM will declare the stream dead. To enable this behavior,the “now” reply should include header fields to indicate how manyseconds have elapsed since the most recent update to “now” (or the epochtime of said update). Additionally, the most last fragment number may beuseful to terminate the range. If SMM cannot contact a sufficient numberof Archivers, no update will be made to the Stream Manifest. There aretwo basic requests: START and STOP. A START request is used by thestream source (e.g. RTMP Puller) to initiate streaming. This requestcreates and/or updates whichever XML tags are necessary to indicate tothe edge server process that the stream is live. A “resume mode” requestis a variation on a “START” request. When Stream Manifest Manager wouldlike to reuse the same url path as an existing stream which SMM believesmight still be “live,” SMM needs some help from RTMP Puller to determinethe correct starting fragment number. When SMM responds to a “START”request with the reply header “X-smm-first-fragment: −1”, RTMP Pullermust attempt to determine whether the returned Archiver set is able toresume the stream, and if so, what the first fragment number of the newstream will be.

As noted above, incoming media packets are demuxed into an IntermediateFormat. The muxer component (residing closer to or on the edge) muxesthe packets to the desired target format before the packets reach theend clients. FIG. 5 above depicts the flow of media packets into and outof the framework. In one embodiment, the ingress and egress format is inAdobe FLV, but this is not a limitation, as noted. In this exampleimplementation, the encoder pushes the CDN customer content into theFlash entry point (EP). The RTMP Puller then pulls the content from theFlash EP and passes the data to the demuxer. Demuxer converts thatincoming source format (e.g., FLV) to IF fragments before injection intothe Archiver sub-network.

The above-described embodiments provide a format-agnostic streamingarchitecture that utilizes an HTTP edge network for object delivery.

The above-described approach provides numerous advantages. Thetechniques described herein facilitate the delivery of high definitionvideo and audio (including advanced video features, such as DVR) over anHTTP-edge network which, in a typical CDN, is the network that has thelargest footprint. By implementing the techniques, a provider canleverage its existing HTTP-based servers instead of having to implementand maintain dedicated server networks to support multiple third partyruntime environments. Moreover, because the delivery is HTTP-based, thecontent can be seamlessly delivered to clients operating across fixedline and mobile environments. No special client software is required, asthe HTTP proxy (that responds to the client request) dynamicallyre-assembles fragments that it obtains and serves the requested contentvia HTTP. Further, because delivery within the set of interconnectedmachines of the CDN preferably takes advantage of an intermediateformat, the network can ingest content in one format yet serve it inanother, all while preserving single or multi-bitrates and DVR-likefunctionality. Thus, for example, the network may take in live RTMPpackets and serve the content as an FLV progressive download.Preferably, each IF fragment of the stream is a separate object for theHTTP proxy that can be accessed, cached, and served via HTTP. Accordingto the scheme, the stream is broken up into many small objects(fragments), with each fragment managed separately.

The network is not limited for use with any particular runtimeenvironment such as Flash. By leveraging the approach as described, asingle set of content sources can be leveraged for different devicecapabilities. Thus, the techniques as described herein includedynamically transmuxing content to other file formats in a manner thatis transparent to the content provider and the end user.

The intermediate format may be based on or adapted from any convenientmultimedia file format that can be used delivery and playback ofmultimedia content. These include, without limitation, fragmented mp4,protected interoperable file format (piff), and others. More generally,any linked list-based file format may be used.

Preferably, the CDN service provider provides an extranet (a web-basedportal) through which the stream delivery is provisioned.

Each above-described process preferably is implemented in computersoftware as a set of program instructions executable in one or moreprocessors, as a special-purpose machine.

Representative machines on which the subject matter herein is providedmay be Intel Pentium-based computers running a Linux or Linux-variantoperating system and one or more applications to carry out the describedfunctionality. One or more of the processes described above areimplemented as computer programs, namely, as a set of computerinstructions, for performing the functionality described.

While the above describes a particular order of operations performed bycertain embodiments of the invention, it should be understood that suchorder is exemplary, as alternative embodiments may perform theoperations in a different order, combine certain operations, overlapcertain operations, or the like. References in the specification to agiven embodiment indicate that the embodiment described may include aparticular feature, structure, or characteristic, but every embodimentmay not necessarily include the particular feature, structure, orcharacteristic.

While the disclosed subject matter has been described in the context ofa method or process, the subject matter also relates to apparatus forperforming the operations herein. This apparatus may be a particularmachine that is specially constructed for the required purposes, or itmay comprise a computer otherwise selectively activated or reconfiguredby a computer program stored in the computer. Such a computer programmay be stored in a computer readable storage medium, such as, but is notlimited to, any type of disk including an optical disk, a CD-ROM, and amagnetic-optical disk, a read-only memory (ROM), a random access memory(RAM), a magnetic or optical card, or any type of media suitable forstoring electronic instructions, and each coupled to a computer systembus. A given implementation of the present invention is software writtenin a given programming language that runs in conjunction with aDNS-compliant name server (e.g., BIND) on a standard Intel hardwareplatform running an operating system such as Linux. The functionalitymay be built into the name server code, or it may be executed as anadjunct to that code. A machine implementing the techniques hereincomprises a processor, computer memory holding instructions that areexecuted by the processor to perform the above-described methods.

While given components of the system have been described separately, oneof ordinary skill will appreciate that some of the functions may becombined or shared in given instructions, program sequences, codeportions, and the like.

Having described our invention, what we now claim is as follows:
 1. Amethod of recording a stream, comprising: receiving the stream in asource format; demuxing the stream from the source format intointermediate format file fragments; and recording the intermediateformat file fragments demuxed from the stream; caching a predeterminednumber of intermediate format file fragments to a temporary data storageunit for live playback of the stream; and archiving the intermediateformat file fragments to a persistent data store after a given timeperiod associated with the live playback.
 2. The method as described inclaim 1 wherein the stream is received in one or more versions.
 3. Themethod as described in claim 2 wherein the one or more versions comprisedistinct bitrate versions of the stream in the source format.
 4. Themethod as described in claim 3 wherein the bitrate versions aremultiplexed over a single connection.
 5. The method as described inclaim 1 wherein the stream is received from an encoder.
 6. The method asdescribed in claim 1 wherein the intermediate format is a fragmented MP4format.
 7. The method as described in claim 1 further includingdetermining an optimal size of an intermediate format file fragment.